Hybrid approaches for automatic vowelization of Arabic texts
نویسندگان
چکیده
Hybrid approaches for automatic vowelization of Arabic texts are presented in this article. The process is made up of two modules. In the first one, a morphological analysis of the text words is performed using the open source morphological Analyzer AlKhalil Morpho Sys. Outputs for each word analyzed out of context, are its different possible vowelizations. The integration of this Analyzer in our vowelization system required the addition of a lexical database containing the most frequent words in Arabic language. Using a statistical approach based on two hidden Markov models (HMM), the second module aims to eliminate the ambiguities. Indeed, for the first HMM, the unvowelized Arabic words are the observed states and the vowelized words are the hidden states. The observed states of the second HMM are identical to those of the first, but the hidden states are the lists of possible diacritics of the word without its Arabic letters. Our system uses Viterbi algorithm to select the optimal path among the solutions proposed by Al Khalil Morpho Sys. Our approach opens an important way to improve the performance of automatic vowelization of Arabic texts for other uses in automatic natural language processing.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملThe Role of Morphology and Short Vowelization in Reading Morphological Complex Words in Arabic: Evidence for the Domination of the Morpheme/Root-Based Theory in Reading Arabic
This study investigated the reading accuracy of 59 adult highly skilled native Arabic readers in reading morphological complex Arabic words in 6 reading conditions: Isolated words with short vowelization, isolated words without short vowelization, sentences with roots with short vowelization, sentences with roots without short vowelization, sentences without priming roots with short vowelizatio...
متن کاملAn Automatic Punctuation Marks System For Arabic Texts
This work presents a system for Automatic Arabic punctuation marks. Existing approaches for automatic punctuation marks do not provide suitable performance for and do not satisfy user interests in Arabic texts. The importance and rising need to automate the correct insertion of punctuation marks in Arabic texts led to a need of specific analysis of the Arabic language to introduce approaches th...
متن کاملUsing Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملStatistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems
Vowelization presents a principle difficulty in building text-tospeech synthesizers for speech-to-speech translation systems. In this paper, a novel log-linear modeling method is proposed that takes into account vowel and diacritical information at both the word level and character level. A unique syllable based normalization algorithm is then introduced to enhance both word coverage and data c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1410.2646 شماره
صفحات -
تاریخ انتشار 2014